Using Variational Autoencoders for Item Response Theory
MIT
December 1, 2025
Researchers with discrete data (parliamentary votes, campaign websites, survey responses) often want to estimate latent traits (ideology, policy positions, personality)
One common choice is Item Response Theory (IRT), a parametric approach
The canonical “2-parameter” IRT model: \[\pi_{ij} = Pr(y_{ij} = c | \alpha_i, \gamma_{j}, \beta_{j}) = F(\underbrace{\alpha_i}_{\text{Ideal Point}} \cdot \underbrace{\gamma_{j}}_{\text{Discrimination}} - \underbrace{\beta_{j}}_{\text{Difficulty}}),\] where \(F(\cdot)\) is a monotone mapping function (e.g., inverse logit or softmax), \(i\) indexes respondents and \(j\) indexes questions. Multiple dimensions are also supported.
Construct a likelihood: \(\mathcal{L} = \prod_{i=1}^N \prod_{j=1}^J \pi_{ij}^{y_{ij}}(1-\pi_{ij}^{1-y_{ij}})\), a prior for the latent traits \(f(\alpha_i) \sim \mathcal{N}(0, 1),\) and solve for the posterior: \(f(\alpha | y) \propto \mathcal{L_i} f(\alpha_i)\)
Current standards in political science:
My proposal from psychometrics and computer science:
Instead of estimating a separate \(\hat g(\cdot)\) for each individual, estimate a single, flexible model that predicts the posterior distribution for an individual given their responses.
We model latent traits \(\alpha_i \sim \text{MVN}(\mu_i, \sigma_i)\) where \((\mu_i, \sigma_i) = \text{NeuralNet}_\mathbf{\phi} (\mathbf{y}_i)\)
Combining them together yields a Variational Autoencoder (Kingma and Welling 2014), a kind of structured neural network model. Parameters are estimated by:
\[\begin{align*} \left( \hat{\mathbf{\phi}}, \hat \gamma, \hat \beta \right) &= \underset{\mathbf{\phi}, \gamma, \beta}{argmax} \left( \mathbb{E}\left[ \log(p(\mathbf{y}_i | \hat \alpha_i ; \gamma_j ; \beta_j)) \right] - \text{KL}\left[ g(\alpha_i | \mathbf{y}_i; \mathbf{\phi}) ||\ f(\alpha_i | \mathbf{y}_i;) \right] \right) \\ &= argmax \text{Evidence Lower Bound (ELBO)} \end{align*}\]
Missing data is common in IRT applications (e.g., legislators don’t vote on every bill, survey respondents skip questions, voters don’t cast a ballot in every contest)
Bespoke model built in PyTorch, then fit for N epochs on MIT’s Engaging Cluster until satisfactory convergence was reached
Mason Reece | mpreece@mit.edu
VAEs for IRT